A Learning-Based Approach for the Identification of Sexual Predators in Chat Logs

نویسندگان

  • Javier Parapar
  • David E. Losada
  • Alvaro Barreiro
چکیده

The existence of sexual predators that enter into chat rooms or forums and try to convince children to provide some sexual favour is a socially worrying issue. Manually monitoring these interactions is a way to attack this problem. However, this manual approach simply cannot keep pace because of the high number of conversations and the huge number of chatrooms or forums where these conversations daily take place. We need tools that automatically process massive amounts of conversations and alert about possible offenses. The sexual predator identification challenge within PAN 2012 is a valuable way to promote research in this area. Our team faced this task as a Machine Learning problem and we designed several innovative sets of features that guide the construction of classifiers for identifying sexual predation. Our methods are driven by psycholinguistic, chat-based, and tf/idf features and yield to very effective classifiers.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Combining Psycho-linguistic, Content-based and Chat-based Features to Detect Predation in Chatrooms

The Digital Age has brought great benefits for the human race but also some drawbacks. Nowadays, people from opposite corners of the World can communicate online via instant messaging services. Unfortunately, this has introduced new kinds of crime. Sexual predators have adapted their predatory strategies to these platforms and, usually, the target victims are kids. The authorities cannot manual...

متن کامل

Combining Predation Heuristics and Chat-Like Features in Sexual Predator Identification

In this paper we present a system for sexual predator detection which combines two different approaches: a knowledge-based system that makes use of pattern matching according to hand-coded patterns that represent typical predator behaviors, and a learning-based system which employs surface linguistic features like capitalization and chat-like expressions. These approaches are combined in a chai...

متن کامل

A Two-step Approach for Effective Detection of Misbehaving Users in Chats

This paper describes the system jointly developed by the Language Technologies Lab from INAOE and the Language and Reasoning Group from UAM for the Sexual Predators Identification task at the PAN 2012. The presented system focuses on the problem of identifying sexual predators in a set of suspicious chatting. It is mainly based on the following hypotheses: (i) terms used in the process of child...

متن کامل

Concept drift detection in business process logs using deep learning

Process mining provides a bridge between process modeling and analysis on the one hand and data mining on the other hand. Process mining aims at discovering, monitoring, and improving real processes by extracting knowledge from event logs. However, as most business processes change over time (e.g. the effects of new legislation, seasonal effects and etc.), traditional process mining techniques ...

متن کامل

On the Impact of Sentiment and Emotion Based Features in Detecting Online Sexual Predators

According to previous work on pedophile psychology and cyberpedophilia, sentiments and emotions in texts could be a good clue to detect online sexual predation. In this paper, we have suggested a list of high-level features, including sentiment and emotion based ones, for detection of online sexual predation. In particular, since pedophiles are known to be emotionally unstable, we were interest...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012